Lab 2. BIOE 515: Landscape Ecology & Management.

Types of spatial data.

Author

Travis Belote

Background and goals.

Throughout this course, we will be working with different kinds of spatial data. It is important to develop a fluency on the formats and types of data. We will first cover these formats and types of data, then combine different formats to conduct some simple analyses in ArcGIS Pro. At the end of the lab you will: (1) know a little bit about geographic projections, (2) be able to describe different data formats and types, (3) conduct several simple but powerful spatial analyses using ArcGIS Pro. Please use the template provided to organize your lab-write up.

Geographic projections and coordinate reference systems

This lab will focus on types of data that landscape ecologists and spatial analysts typically work with, but first we need to understand the importance of geographic coordinate systems and projections. In R these are called coordinate reference systems (crs). A coordinate reference system is a mathematical way of representing our three-dimensional world on a two-dimensional map. We won’t cover these in depth here, but managing projections can be a challenge. You will have to address projections if you work with spatial data. We will cover these in more depth next week. The video below is helpful to understand why the same maps can look so different.

If the video player doesn’t work, watch the video here: https://www.youtube.com/watch?v=kIID5FDi2JQ

Different formats and types of data.

Spatial data will be one of three formats:

(1) tables or tabular data (e.g., spreadsheets, data frames, attribute tables),

(2) vector (e.g., feature, shapefiles, polygons)

(3) raster (e.g., gridded map of pixels).

Raster and vector data will sometimes be called different kinds of “spatial data models”, but - given the variety of ways ecologists use “model” - I will avoid that here.

Below are examples of different kinds or formats of data.

Data organized and shared as these three formats (tabular, vector, raster) can be qualitative (categorical) or quantitative (numeric). This is a helpful chart from this online book. It is important to consider what kind of data you are working with and how the data can be combined to ask interesting questions.

Tables can obviously hold data of different kinds. Consider a table of field samples (e.g., “plots”) collected in a forest. Each plot might have data collected on the type of forest (e.g., aspen, Douglas-fir, lodgepole pine), the age of the forest (1 to 100s of years), number of trees for each species within the sample, number of species within each sample, average height of the forest stand, average seasonal temperature, etc. All of these data could be classified using the framework above.

Table of fictional forest plots with different kinds of data.

PlotID Forest type Elevation (m) Stand age (yrs) Tree density (trees per plot) Tree richness (species per plot) Stand height (m) Avg summer temp (°C) Wildfire severity
1A Aspen 1500.5 60 800 6 18.4 16.2 Low
2A Douglas-fir 1900.1 120 600 8 25.1 14.4 Unburned
3A Aspen 2100.3 70 700 5 15.9 13.8 Moderate
4A Lodgepole pine 2400.8 90 1200 4 20.7 12.3 High
TipTASK 1 (5 points).

Fill out which type of data are represented in the table above. In the last row, what other data could be collected in the field to tell you something about the ecosystem? In the last four rows, list four variables that could be collected that would be considered the four different types.

Variable Data type (nominal, ordinal, discrete, or continuous?)
PlotID Nominal
Forest type __________________________
Elevation (m) Continuous
Stand age (yrs) __________________________
Tree density (trees per plot) __________________________
Tree richness (species per plot) Discrete
Stand height (m) __________________________
Avg summer temp (°C) __________________________
Wildfire severity __________________________
__________________________ nominal
__________________________ ordinal
__________________________ discrete
__________________________ continuous

Raster data can represent quantitative and categorical data of different kinds. Consider land cover data where pixels (small squares of known sizes) are assigned a category of land cover type (forest, grassland, developed). Raster data can also hold continuous quantitative data like elevation, total ecosystem carbon (mass of carbon per area), or an index of habitat suitability. Raster data can be used to represent categorical or continuous data as in the example below showing the National Land Cover Database (NLCD) (categorical land cover types, top) and elevation (meters above sea level, bottom).

Vector data are either polygons (states, wilderness areas, discrete maps of a species distribution), lines (streams, roads), or points (locations of plots or samples). Vector data can also be used to represent categorical or continuous data as in the example below showing the polygons of level 4 ecoregions (categorical regions, top) and mean species richness within those polygons (average number of species).

Consider how categorical data can be either vector (left, below) or raster (right, below). It is also possible to transform data from vector to raster. Image borrowed from Kim With’s Essentials of Landscape Ecology (Figure 4.14). We will do this in an R exercise next week.

Data of different types can be related spatially through shared locations (e.g., plots collected within a national park where total ecosystem carbon has been estimated). Data of different kinds can also be combined based on relationships between attributes. Consider relating a database on the traits of different species (e.g., maximum height, lifespan, breeding habitat) with data on samples collected in plots of those species. You can relate the species observed in plots with the traits of those species. We will not spend much time on relational databases, but I recommend learning some data wrangling skills (e.g., left_join in R’s dplyr package)

When you know how to work with different kinds of data of different types, you can find creative ways of combining disparate datasets to ask really interesting research questions.

ArcGIS Pro exercises

We’re going to do a few simple queries and summaries using data clipped to the Greater Yellowstone Ecosystem. Open ArcGIS Pro, navigate to the Lab 2 folder, and bring in data.

Open ArcGIS Pro, sign in with your NetID, start a new project with a Map. Give the new project a reasonable name (e.g., Lab 2 - Sept 2, 2025).

When your map opens, you can either click on “Add Data” to add all of the layers from the lab folder OR you can find the “Catalog” window in the far right, select Computer, navigate to the folder, and drag and drop all layers into the map.

Bring in all of the data, look at each layer, and consider what kind of data these layers represent. Click off all layers in the Contents pane and examine each dataset individually. Zoom in and out and click around and customize the symbology in different ways. You could spend all day doing this, but spend 5 or 10 minutes getting to know the data and changing the symbology of some of the layers.

Right-click on the layer name and select Attribute Table. This will open the attribute table for the dataset, if it has one. The attribute table considered by itself is what kind of data?

TipTASK 2 (5 points).

Below is a table of data in this lab’s folder. List which type of data each layer represents (raster, vector, or tabular?).

Filename Description Source Type (raster or vector?)
elevation_GYE.tif Digital elevation model LANDFIRE __________________
FIAdataV2_basal_area_ft2_per_acre.shp Forest Inventory and Analysis plots Forest Service FIA __________________
GAP_vertebrate_richness_GYE.tif Richness of terrestrial vertebrates USGS GAP __________________
GYE_boundary.shp Greater Yellowstone Ecosystem boundary Greater Yellowstone Coordinating Cmte __________________
NLCD_2024_GYE.tif National Land Cover Database USGS NLCD __________________
NorthAmericanRivers_GYE.shp Major rivers Commission forEnvironmental Cooperation __________________
PADUS4_1VectorAnalysis_GYE.shp Protected Areas Database USGS GAP __________________
us_eco_l4_GYE.shp Ecoregions (level 4) EPA __________________
usfs_carbon_total_initial_tons_per_acre_GYE.tif Total forest carbon USFS Firelab __________________

Exercise 1: What is the landscape composition of the GYE?

We want to know how much of the Greater Yellowstone Ecosystem is forests, grasslands, shrublands, and considered developed (i.e., the composition of the landscape). This is a fundamental landscape measure discussed by Noss 1990 and shown in the upper-right of his diagram.

Click off all layers except for the land cover data (NLCD_2024_GYE.tif) and make sure you can see the different cover types in the Contents. One very basic question of landscape ecology is: “what is the composition of the landscape?” In other words, how much is there of what kind of land cover type? We can assess this on an absolute (total hectare or acres) or relative basis (% of the landscape). Before we use the GIS to calculate the relative amount of each land cover type, look at the map and see if you can guess how much of the Greater Yellowstone Ecosystem is forest, grassland, developed, etc.

Now, let’s calculate the percent area occupied by each cover type (i.e., the composition of the landscape).

Right-click on NLCD_2024_GYE.tif and select Open Attribute Table. This include Value which is actually a nominal variable type and represents the land cover code for each named land cover class (NLCD_Land, the right-hand column). The table also includes “Count”, which is the number of pixels within that land cover class.

TipTASK 3 (10 points).

How would you calculate the percentage of each land cover class? Try to do this calculation using whatever means you’d like.

How would you calculate the absolute area of each land cover class?

Exercise 2: Where are the whitebark pines in the GYE?

The Forest Inventory and Analysis program is a national monitoring effort that “collects, processes, analyzes, and reports on data necessary for assessing the extent and condition of forest resources in the United States.” I have downloaded and pre-processed FIA plots to create a plot x species matrix, a common way to organize data on ecological communities. The data are in filename: FIAdataV2_basal_area_ft2_per_acre.shp. This shapefile includes the locations of FIA plots and an attribute table where rows are FIA plots and columns are species with the cells representing total basal area for that species. See this cartoon explaining basal area (the cross-sectional area of tree trunks). Basal area is a common way to measure tree abundance in forests (density or trees per area is the other).

One important note on FIA plot locations: they are intentionally offset to be inaccurate (i.e., “fuzzed” and in some cases “swapped”) to protect privacy of land owners and forest resources. While this complicates some analyses, studies have shown that many patterns are robust to this fuzzing, but see this paper too.

We will use FIA data to investigate broad patterns of whitebark pine (Pinus albicaulis) in the Greater Yellowstone Ecosystem. Let’s first quickly select which FIA plots have whitebark pine in them. We’ll use “Select By Attributes” to identify every plot where basal area of whitebark pine is greater than 0. Make sure the attribute table of the FIA data is open and click on “Select By Attributes” circled in red below.

In the Expression section choose “Pinus_albi” (Pinus albicaulis, whitebark pine) is greater than 0 and click OK. This will select the FIA plots that have some whitebark pine and turn the plot symbols in the map cyan (the color that usually indicates a feature is selected).

Notice the button and information I have circled in red below. The button filters the attribute table to only those rows that were selected. You can also see how many of the rows were selected from the total. This can be a quick and easy way of calculating a high-level result (i.e., how many or what proportion of your data include a select species, like whitebark pine?).

You can also right-click on any species and sort descending to see which plot has the highest whitebark pine basal area.

TipTASK 4 (10 points).

Where is the plot with the highest basal area of whitebark pine?

Choose another species of tree, use Select By Attribute to identify and describe their locations. Paste a screenshot of the map with selected FIA plots. If you don’t have a favorite tree to choose, try Ponderosa pine (Pinus ponderosa) which has the code Pinus_pond in the data.

Exercise 3: What’s the relationship between tree and vertebrate diversity, ecosystem carbon, and elevation in the Greater Yellowstone Ecosystem?

We have pulled in other datasets that can be combined to ask pretty cool research questions like “what is the relationship between elevation, total ecosystem carbon, total tree basal area, tree richness, and vertebrate richness?” There are lots of conceptual models and hypotheses to describe the causes and consequences of relationships among some of these variables. For instance, do sites with high tree diversity also support higher diversity of terrestrial vertebrates? Do sites with more species store more carbon? Do sites with more carbon support more terrestrial vertebrates? How do these patterns change with elevation? The ecological questions could go on and on with just the few datasets available to use here.

We already have FIA plots overlaid with data on elevation, total forest carbon, vertebrate species richness, etc. I have already calculated tree richness in the FIA data by counting the non-zero columns (i.e., how many species were found be present in each plot?). Go up to Analysis and click Tools, the icon looks like a toolbox. This will open a Geoprocessing panel on the right.

In the search bar, which says Find Tools type in “Extract Multi Values to Points”.

Make sure the Input point features shows the FIA data and select the carbon, vertebrate richness, and elevation raster data. This will essential extract the values from each raster and append it to the FIA attribute table allowing us to do more statistical analysis with these data. I’ve tweaked the names in the “Output field name” a bit to clean up the new variable names. Click Run.

If that runs successfully, check the attribute table in the FIA data to see if the new data now appear. The new variables on carbon, vertebrate richness, and elevation will be on the far right side of your attribute table. You’ll need to scroll over to see them.

Now, let’s investigate the relationships we asked above using simple scatter plots, which can do right in ArcGIS Pro. I do not recommend conducting “real” analyses in ArcGIS Pro, but sometimes a fast data exploration is valuable. To create the scatter plots, right-click on the FIA data, select Create Chart, and Scatter Plot.

TipTASK 5 (10 points).

You can now change which variables are shown on the x- and y-axes using a simple drop down menu. The scatter plot is interactive, allowing you to investigate the geographic locations of conditions plotted along the x- and y-axes. What questions are most interesting to you? Is the relationship between tree and vertebrate richness interesting to you? Check out the relationship. What about elevation and carbon? Ask whatever question you are most interested in with this simple Scatter Plot and show your results with a very short write up.

Next week we will be doing more of this kind of analysis using “raster stacks” in R using the terra, dplyr, and ggplot2 packages.